Skip to content

fix: auto-reject PROGRAM messages with non-dict metadata#1137

Merged
odesenfans merged 2 commits into
mainfrom
fix/reject-program-invalid-metadata
May 18, 2026
Merged

fix: auto-reject PROGRAM messages with non-dict metadata#1137
odesenfans merged 2 commits into
mainfrom
fix/reject-program-invalid-metadata

Conversation

@odesenfans
Copy link
Copy Markdown
Collaborator

Summary

Some PROGRAM messages slipped past validation while ExecutableContent.metadata accepted lists. The current validator requires a dict, so reading those rows fails parsed_content and surfaces as 500s on GET /api/v0/messages/<hash> (ex: 42a4a8...3d96f3 returns 500, while the same hash on epyc properly reports the message as rejected).

This change:

  • Adds mark_processed_message_as_rejected in aleph.repair. It mirrors mark_pending_message_as_rejected but starts from a MessageDb row instead of a PendingMessageDb: cleans up VM rows for program/instance, upserts rejected_messages, flips message_status to REJECTED, and deletes the messages row. The trigger keeps message_counts consistent; FK cascades clean message_confirmations and account_costs.
  • Adds _reject_invalid_program_metadata and wires it into repair_node so the API rejects affected PROGRAM messages on every startup. The query uses jsonb_typeof(content->'metadata') = 'array'; an empty result is a no-op.
  • Ships deployment/scripts/reject_processed_messages.py for ad-hoc cleanups when a restart is not an option. Dry-run by default, --commit to persist; targets specific hashes via --hash / --hashes-file. Runs from inside the API container against the deployed config at /var/pyaleph/config.yml.

Test plan

  • venv/bin/python -m pytest tests/test_repair.py -v — 5 tests, all pass (rejects list metadata, preserves dict/None metadata, ignores non-program types, no-op on empty DB).
  • venv/bin/python -m pytest tests/db/test_messages.py tests/db/test_credit_balances.py — adjacent suites still pass (63 total).
  • venv/bin/ruff check + black + isort clean on changed files.
  • Manual: on a staging snapshot, confirm targeted hashes flip from PROCESSED to REJECTED and GET /messages/<hash> no longer 500s.

🤖 Generated with Claude Code

Some PROGRAM messages slipped past validation while ExecutableContent.metadata
accepted lists. The current validator requires a dict, so reading those rows
fails parsed_content and surfaces as 500s on GET /messages/<hash>. Move them
to REJECTED at startup so the API renders them like nodes that rejected them
in the first place.

The transition logic also lives behind a deployment/scripts helper for ad-hoc
cleanups when waiting for a restart is not an option.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
foxpatch-aleph
foxpatch-aleph previously approved these changes May 14, 2026
Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-structured fix for a production bug where PROGRAM messages with list-typed metadata cause 500s via parsed_content. Implements a reusable rejection utility for processed messages, wires a repair function into startup, and ships a companion CLI script. Thorough test coverage and good code quality throughout.

src/aleph/repair.py (line 69): Consider using session.execute(delete_vm_updates(...)) instead of _ = list(...) to avoid loading results into memory and make the intent clearer. The list() is needed to force execution, but a comment explaining why would help maintainers.

mypy on CI flagged the str args. Pick a hex-valid prefix for the good
sample hash too so ItemHash() does not reject it at runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A well-structured fix for PROGRAM messages with invalid list-typed metadata that cause 500 errors. The approach is correct: repair_node rejects them at startup, a standalone script handles ad-hoc cases, and both properly clean up VM rows and cascade to account_costs via FK. The race condition between the initial query and per-hash processing is properly handled with a re-check. Tests cover the main scenarios (list/dict/null metadata, non-PROGRAM types, empty DB). No bugs or security issues found.

deployment/scripts/reject_processed_messages.py (line 256): Minor: changed is incremented in both the --commit path and the dry-run path, so the summary count is not strictly 'changed' in the commit sense. Consider using two separate counters or mentioning 'processed' in the count label.

@odesenfans odesenfans merged commit 1ca5516 into main May 18, 2026
4 checks passed
@odesenfans odesenfans deleted the fix/reject-program-invalid-metadata branch May 18, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants